Goto

Collaborating Authors

 vulnerability score


Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents

Bazinska, Julia, Mathys, Max, Casucci, Francesco, Rojas-Carulla, Mateo, Davies, Xander, Souly, Alexandra, Pfister, Niklas

arXiv.org Artificial Intelligence

AI agents powered by large language models (LLMs) are being deployed at scale, yet we lack a systematic understanding of how the choice of backbone LLM affects agent security. The non-deterministic sequential nature of AI agents complicates security modeling, while the integration of traditional software with AI components entangles novel LLM vulnerabilities with conventional security risks. Existing frameworks only partially address these challenges as they either capture specific vulnerabilities only or require modeling of complete agents. To address these limitations, we introduce threat snapshots: a framework that isolates specific states in an agent's execution flow where LLM vulnerabilities manifest, enabling the systematic identification and categorization of security risks that propagate from the LLM to the agent level. We apply this framework to construct the $\operatorname{b}^3$ benchmark, a security benchmark based on 194331 unique crowdsourced adversarial attacks. We then evaluate 31 popular LLMs with it, revealing, among other insights, that enhanced reasoning capabilities improve security, while model size does not correlate with security. We release our benchmark, dataset, and evaluation code to facilitate widespread adoption by LLM providers and practitioners, offering guidance for agent developers and incentivizing model developers to prioritize backbone security improvements.


Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments

Tong, Zhao, Gong, Chunlin, Gu, Yimeng, Shi, Haichao, Liu, Qiang, Wu, Shu, Zhang, Xiao-Yu

arXiv.org Artificial Intelligence

The spread of fake news online distorts public judgment and erodes trust in social media platforms. Although recent fake news detection (FND) models perform well in standard settings, they remain vulnerable to adversarial comments-authored by real users or by large language models (LLMs)-that subtly shift model decisions. In view of this, we first present a comprehensive evaluation of comment attacks to existing fake news detectors and then introduce a group-adaptive adversarial training strategy to improve the robustness of FND models. To be specific, our approach comprises three steps: (1) dividing adversarial comments into three psychologically grounded categories: perceptual, cognitive, and societal; (2) generating diverse, category-specific attacks via LLMs to enhance adversarial training; and (3) applying a Dirichlet-based adaptive sampling mechanism (InfoDirichlet Adjusting Mechanism) that dynamically adjusts the learning focus across different comment categories during training. Experiments on benchmark datasets show that our method maintains strong detection accuracy while substantially increasing robustness to a wide range of adversarial comment perturbations.


A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications

Rashid, Maisha Binte, Rivas, Pablo

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) are increasingly deployed in public sector missions, necessitating robust evaluation of their safety and vulnerability to adversarial attacks. This paper introduces a novel framework to quantify adversarial risks in VLMs. We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noise patches and saliency patterns that highlight vulnerable regions. These patterns are compared against the Fast Gradient Sign Method (FGSM) to assess their adversarial effectiveness. We propose a new Vulnerability Score that combines the impact of random noise and adversarial attacks, providing a comprehensive metric for evaluating model robustness.


A Novel Approach To User Agent String Parsing For Vulnerability Analysis Using Mutli-Headed Attention

Nandakumar, Dhruv, Murli, Sathvik, Khosla, Ankur, Choi, Kevin, Rahman, Abdul, Walsh, Drew, Riede, Scott, Dull, Eric, Bowen, Edward

arXiv.org Artificial Intelligence

The increasing reliance on the internet has led to the proliferation of a diverse set of web-browsers and operating systems (OSs) capable of browsing the web. User agent strings (UASs) are a component of web browsing that are transmitted with every Hypertext Transfer Protocol (HTTP) request. They contain information about the client device and software, which is used by web servers for various purposes such as content negotiation and security. However, due to the proliferation of various browsers and devices, parsing UASs is a non-trivial task due to a lack of standardization of UAS formats. Current rules-based approaches are often brittle and can fail when encountering such non-standard formats. In this work, a novel methodology for parsing UASs using Multi-Headed Attention Based transformers is proposed. The proposed methodology exhibits strong performance in parsing a variety of UASs with differing formats. Furthermore, a framework to utilize parsed UASs to estimate the vulnerability scores for large sections of publicly visible IT networks or regions is also discussed. The methodology present here can also be easily extended or deployed for real-time parsing of logs in enterprise settings.


Overdose Risk Prediction Algorithms: The Need For A Comprehensive Legal Framework

#artificialintelligence

Risk prediction has permeated many aspects of modern life, including health care. Algorithms developed using advanced statistical methods have been used to identify hospitalized adults at risk of clinical deterioration, reduce hospital readmission rates, and improve resource allocation and health care use. These methods have also been used to develop predictive models for overdose risk among specific patient populations. Most of these overdose-specific applications, however, have been limited to health care settings using health care utilization or insurance claims data. State and local governments are increasingly integrating health- and non-health-sector data for public health purposes, creating an opportunity to use these data to improve overdose risk prediction models.